For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.
Only .ipynb files will be graded for your code.
Compress all the files into a single .zip file.
Do not submit a printed version of your code, as it will not be graded.
We will create a convolutional neural network to classify images of berries, birds, dogs, and flowers. To get started, we need to download the dataset. This dataset will be utilized for both Problem 2 and Problem 3.
(1) Load the provided dataset.
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from google.colab import drive
drive.mount('/content/drive')
train_image = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_train_image.npy')
train_label = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_train_label.npy')
test_image = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_test_image.npy')
test_label = np.load('/content/drive/MyDrive/DL_Colab/DL_data/TL_CAM_test_label.npy')
(2) Visualize ten randomly selected images from the training dataset.
index = np.random.choice(train_image.shape[0], 10, replace = False)
plt.figure(figsize = (15, 6))
loc = 1
for i in index:
plt.subplot(2, 5, loc)
plt.imshow(train_image[i])
plt.xticks([])
plt.yticks([])
loc += 1
plt.show()
We will utilize the VGG16 architecture to train our dataset. As shown in the image below, the VGG16 architecture consists of 16 layer blocks with a substantial number of trainable parameters. Fortunately, deep learning libraries like TensorFlow, Keras, and PyTorch offer pre-trained models for ImageNet, sparing us from the need to design and train a model from the ground up.
(1) Create a VGG16 model using deep learning libraries, such as TensorFlow, Keras, or PyTorch.
vgg16_model = tf.keras.applications.vgg16.VGG16()
vgg16_model.summary()
(2) Revise the original VGG16 architecture. As shown in the image below, we will make modifications exclusively to the fully connected layer section. Additionally, given that we are using pre-trained parameters, the parameters of the feature extraction portion must remain fixed.
vgg16_model.trainable = False
vgg16_model.summary()
block5_pool_layer = vgg16_model.layers[- 5].output
conv2d = tf.keras.layers.Conv2D(filters = 1024,
kernel_size = (3, 3),
activation = 'relu',
padding = 'SAME')(block5_pool_layer)
global_average_pooling2d = tf.keras.layers.GlobalAveragePooling2D()(conv2d)
dense = tf.keras.layers.Dense(units = 4, activation = 'softmax')(global_average_pooling2d)
model = tf.keras.Model(inputs = vgg16_model.inputs, outputs = dense)
model.summary()
(3) Train the modified VGG16 model.
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = 'accuracy')
model.fit(train_image, train_label, batch_size = 128, epochs = 5)
(4) Print your accuracy with the test dataset.
test_loss, test_acc = model.evaluate(test_image, test_label, verbose = 0)
print('Accuracy: {:.2f} %'.format(test_acc*100))
(1) Visualize the Class Activation Mapping (CAM) results as presented in the provided figure.
def classname(n):
if n == 0:
return 'berry'
elif n == 1:
return 'bird'
elif n == 2:
return 'dog'
else:
return 'flower'
conv_layer = model.get_layer(index = - 3)
fc_layer = model.layers[- 1].get_weights()[0]
my_map = tf.matmul(conv_layer.output, fc_layer)
CAM = tf.keras.Model(inputs = model.inputs, outputs = my_map)
idx = np.random.choice(test_image.shape[0], 3, replace=False)
cam_x = []
pred_y = []
for i in idx:
pred = np.argmax(model.predict(test_image[[i]]), axis = 1)
predCAM = CAM.predict(test_image[[i]])
attention = predCAM[:, :, :, pred]
attention = np.abs(np.reshape(attention, (7, 7)))
resized_attention = cv2.resize(attention,
(224, 224),
interpolation = cv2.INTER_CUBIC)
cam_x.append(resized_attention)
pred_y.append(pred)
plt.figure(figsize = (6, 9))
for i in range(3):
plt.subplot(3, 2, 2 * i + 1)
plt.imshow(test_image[idx[i]])
plt.title('True: {} / Pred: {}'.format(classname(test_label[idx[i]]), classname(pred_y[i])), fontsize = 15)
plt.axis('off')
plt.subplot(3, 2, 2 * i + 2)
plt.imshow(test_image[idx[i]])
plt.imshow(cam_x[i], 'jet', alpha = 0.5)
plt.title('Class Activation Map', fontsize = 15)
plt.axis('off')
plt.show()